Dynamic Vocal Tract Length Normalization in Speech Recognition

نویسندگان

  • Daniel Elenius
  • Mats Blomberg
چکیده

A novel method to account for dynamic speaker characteristic properties in a speech recognition system is presented. The estimated trajectory of a property can be constrained to be constant or to have a limited rate-of-change within a phone or a sub-phone state. The constraints are implemented by extending each state in the trained Hidden Markov Model by a number of property-value-specific sub-states transformed from the original model. The connections in the transition matrix of the extended model define possible slopes of the trajectory. Constraints on its dynamic range during an utterance are implemented by decomposing the trajectory into a static and a dynamic component. Results are presented on vocal tract length normalization in connected-digit recognition of children's speech using models trained on male adult speech. The word error rate was reduced compared with the conventional utterance-specific warping factor by 10% relative.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت

The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...

متن کامل

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTL...

متن کامل

Rapid vocal tract length normalization using maximum likelihood estimation

Recently, vocal tract length normalization (VTLN) techniques have been developed for speaker normalization in speech recognition. This paper proposes a new VTLN method, in which the vocal tract length is normalized in the cepstrum space by means of linear mapping whose parameter is derived using maximumlikelihood estimation. The computational costs of this method are much lower than that of suc...

متن کامل

On the Use of a Wave-Reflection Model for the Estimation of Spectral Effects due to Vocal Tract Length Changes with Application to Automatic Speech Recognition

Vocal tract length normalization (VTLN) is commonly used in state-of-the-art automatic speech recognition (ASR) systems to reduce the mismatch between speaker-dependent formant frequency scalings. Usually, the normalization is done by a piece-wise linear scaling of the filter bank center frequencies. The linear scaling is motivated by a uniform acoustic tube model that does not take any loss ef...

متن کامل

Real-Time Vocal Tract Length Normalization in a Phonological Awareness Teaching System

Speaker normalization in a speech recognition can significantly improve speech recognition accuracy. One such method, vocal tract length normalization (VTLN), is especially useful when the system has to work reliably for males, females and children. It is just this situation with our phonological awareness teaching system, the “SpeechMaster”, which aims at real-time phoneme recognition and feed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010